Unsupervised Variable Selection: when random rankings sound as irrelevancy

نویسنده

  • Sébastien Guérif
چکیده

Whereas the variable selection has been extensively studied in the context of supervised learning, the unsupervised variable selection has attracted attention of researchers more recently as the available amount of unlabeled data has exploded. Many unsupervised variable ranking criteria were proposed and their relevance is usually demonstrated using either external cluster validity indexes or the accuracy of a classifier which are both supervised criteria. Actually, the major issue of the variable subset selection according to a ranking measure has been adressed only by few authors in the unsupervised learning context. In this paper, we propose to combine multiple ranking to go ahead toward a stable consensus variable subset in a totally unsupervised fashion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures

A recent study examined the stability of rankings from random forests using two variable importance measures (mean decrease accuracy (MDA) and mean decrease Gini (MDG)) and concluded that rankings based on the MDG were more robust than MDA. However, studies examining data-specific characteristics on ranking stability have been few. Rankings based on the MDG measure showed sensitivity to within-...

متن کامل

Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions

Risk of complex disorders is thought to be multifactorial, involving interactions between risk factors. However, many genetic studies assess association between disease status and markers one single-nucleotide polymorphism (SNP) at a time, due to the high-dimensional nature of the search space of all possible interactions. Three ensemble methods have been recently proposed for use in high-dimen...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

Unsupervised Taxonomy of Sound Effects

Sound effect libraries are commonly used by sound designers in a range of industries. Taxonomies exist for the classification of sounds into groups based on subjective similarity, sound source or common environmental context. However, these taxonomies are not standardised, and no taxonomy based purely on the sonic properties of audio exists. We present a method using feature selection, unsuperv...

متن کامل

Bias and stability of single variable classifiers for feature ranking and selection

Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008